Skip to content

[TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver#15432

Open
chienchunhung wants to merge 4 commits into
NVIDIA:mainfrom
chienchunhung:codex/staged-hooks-wave5-mx-publisher
Open

[TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver#15432
chienchunhung wants to merge 4 commits into
NVIDIA:mainfrom
chienchunhung:codex/staged-hooks-wave5-mx-publisher

Conversation

@chienchunhung

@chienchunhung chienchunhung commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Stacked on Wave 4 / #15387.

This implements Wave 5 of the staged post-load hook rollout for MX:

  • publish MX sources after post-load transforms with SourceIdentity and transform-layout metadata
  • let compatible, allow-listed Llama receivers consume post-transform MX bytes and run only setup_aliases() + cache_derived_state()
  • fail closed before P2P when SourceIdentity is missing/mismatched, transform protocol metadata is unsupported, or the model is not allow-listed
  • add unit coverage for metadata fallback cases, publish metadata, GMS/MX post-load publish ordering, and a tiny real-Llama staged receiver equivalence check

Dependency / prerequisite stack

This PR is Wave 5 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto main so review and CI focus on that wave's delta.

Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.

graph TD
    PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"]
    PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"]
    PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (open)"]
    PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (draft)"]
    PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (draft)"]
    PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (draft)"]
    PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (this PR, draft)"]
    VERIFY["post-migration verification / demo (planned)"]

    PR14770 -->|satisfied| PR15014
    PR14878 -->|satisfied| PR15014
    PR15014 -->|blocking| PR15288
    PR15288 -->|blocking| PR15386
    PR15386 -->|blocking| PR15387
    PR15387 -->|blocking| PR15432
    PR15432 -.->|planned| VERIFY

    classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d;
    classDef inflight fill:#dbeafe,stroke:#2563eb,color:#1e3a8a;
    classDef draft fill:#ffedd5,stroke:#f97316,color:#7c2d12;
    classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px;
    classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5;
    linkStyle 0,1 stroke:#16a34a,stroke-width:2px;
    linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px;
    linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5;

    class PR14770,PR14878 merged;
    class PR15014 inflight;
    class PR15288,PR15386,PR15387 draft;
    class PR15432 current;
    class VERIFY downstream;
Loading

Immediate merge dependency for this PR: #15387 must land first; after Wave 5 lands, run the post-migration verification/demo for the completed staged-hook rollout.

Validation

  • git diff --check
  • python -m py_compile tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py tensorrt_llm/_torch/pyexecutor/model_loader.py tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py tests/unittest/_torch/weight_sharing/test_mx_source_identity_gate.py
  • pre-commit on commit, with waive list check and validate-test-lists skipped locally because scripts/check_test_list.py fails under this hook interpreter with TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Focused pytest collection is blocked in this local environment by missing transformers before tests are collected.

Summary by CodeRabbit

  • New Features

    • Added support for staged post-transform weight transfers with source identity verification in ModelExpress transfers.
    • Introduced source identity gating to validate weight compatibility across distributed systems.
  • Improvements

    • Refactored weight transformation pipeline to separate weight transformation and state caching phases for improved clarity and maintainability.

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54683 [ run ] triggered by Bot. Commit: 756e717 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54683 [ run ] completed with state FAILURE. Commit: 756e717
/LLM/main/L0_MergeRequest_PR pipeline #43714 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch 2 times, most recently from ae210cb to f123c77 Compare June 18, 2026 00:37

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54888 [ run ] triggered by Bot. Commit: f123c77 Link to invocation

@chienchunhung chienchunhung changed the title [TRTLLM-13250][feat] Enable MX post-transform Llama receiver [TRTLLM-13250][feat] Wav3 5: Enable MX post-transform Llama receiver Jun 18, 2026
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54888 [ run ] completed with state SUCCESS. Commit: f123c77
/LLM/main/L0_MergeRequest_PR pipeline #43893 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from f123c77 to 14a4537 Compare June 19, 2026 01:32

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54955 [ run ] triggered by Bot. Commit: 14a4537 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54955 [ run ] completed with state SUCCESS. Commit: 14a4537
/LLM/main/L0_MergeRequest_PR pipeline #43955 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@chienchunhung chienchunhung changed the title [TRTLLM-13250][feat] Wav3 5: Enable MX post-transform Llama receiver [TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver Jun 21, 2026
@chienchunhung chienchunhung marked this pull request as ready for review June 22, 2026 18:04
@chienchunhung chienchunhung requested review from a team as code owners June 22, 2026 18:04
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The PR refactors the model weight-loading lifecycle by splitting post_load_weights() into three ordered hooks—setup_aliases() (cross-layer tensor aliasing), transform_weights() (idempotent, guarded by _weights_transformed), and cache_derived_state() (derived state recomputation)—across all module and model classes. It also adds an MX (Model Express) staged post-transform receiver path in ModelLoader that delivers already-transformed weights using source-identity metadata verification and compatibility gating, and reorders the GMS read-only post-load sequence to run alias setup before materialization.

Changes

Weight-load Lifecycle Refactor and MX Staged Receiver

Layer / File(s) Summary
Base lifecycle hook contracts
tensorrt_llm/_torch/modules/linear.py, tensorrt_llm/_torch/modules/fused_moe/interface.py, tensorrt_llm/_torch/modules/fused_moe/quantization.py, tensorrt_llm/_torch/models/checkpoints/base_checkpoint_loader.py
LinearMethodBase gains transform_weights() (no-op default); MoE gains transform_weights() (once-guarded) and cache_derived_state(); FusedMoEMethodBase splits post_load_weights() into transform_weights() + cache_derived_state(); BaseCheckpointLoader adds is_post_transform_weights_preloaded() returning False.
Module-level hook implementations
tensorrt_llm/_torch/modules/linear.py, tensorrt_llm/_torch/modules/attention.py, tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py, tensorrt_llm/_torch/attention_backend/sparse/dsa.py, tensorrt_llm/_torch/models/modeling_llama_min_latency.py, tensorrt_llm/_torch/modules/fused_moe/...
Linear, MLA, Mamba2Mixer, Indexer, Llama4MinLatencyGatedMLP/MoE, ConfigurableMoE, and all fused MoE backends (Cutlass, DenseGEMM, Triton, TRTLLMGen, WideEP, MegaMoE) add _weights_transformed guards, extract transform_weights()/cache_derived_state(), and redirect post_load_weights() to those two methods. Multiple quant method subclasses rename their post_load_weights override to transform_weights or cache_derived_state.
Model-level post_load_weightssetup_aliases renames
tensorrt_llm/_torch/models/modeling_llama.py, modeling_deepseekv3.py, modeling_exaone_moe.py, modeling_glm.py, modeling_gpt_oss.py, modeling_qwen3_moe.py, modeling_qwen3_next.py
All model classes whose post_load_weights only wired cross-layer norm aliases rename the method to setup_aliases; inline comments referencing post_load_weights updated to setup_aliases.
MX checkpoint loader: staged post-transform receiver and publishing
tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Adds _post_transform_weights_preloaded / _source_identity_compatible_for_last_load flags and is_post_transform_weights_preloaded property; rewrites load_weights() to fetch source metadata, perform metadata-based identity compatibility gating, check weight-layout state (pre/post-transform), fall back to disk on incompatibility or partial failure; adds source_identity parameter to publish_as_source()/post_load_publish(); introduces module helpers for metadata construction, signature introspection, layout detection, and SourceIdentity parsing.
ModelLoader orchestration: GMS RO reordering and MX staged path
tensorrt_llm/_torch/pyexecutor/model_loader.py, tensorrt_llm/_torch/memory/gpu_memory_backend.py
Adds MX staged-receiver allow-list and decision helpers to ModelLoader; rewrites GMS RO sequence to run _setup_aliases → identity check → materialize_module_walk_cache_state; adds conditional MX staged receiver branch (_setup_aliases_mark_weights_transformed_walk_cache_state → publish) vs full _walk_full_post_load; changes _setup_aliases to walk the module tree; resets _weights_transformed on reload(); updates GMS backend docs to reflect new ordering.
Unit tests
tests/unittest/_torch/attention/sparse/test_dsa_indexer.py, tests/unittest/_torch/modules/mamba/test_mamba2_mixer.py, tests/unittest/_torch/modules/moe/test_moe_backend.py, tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py, tests/unittest/_torch/pyexecutor/test_model_loader_gms.py, tests/unittest/_torch/pyexecutor/test_model_loader_mx.py, tests/unittest/_torch/weight_sharing/test_mx_source_identity_gate.py
New and updated tests covering DSA/Mamba2 derived-state caching, MoE/ConfigurableMoE idempotent staging, MX post-transform load/publish metadata assertions, GMS RO exact event-ordering pin, MX staged receiver allowlist validation, transform-flag idempotency for Linear and MLA, and _fetch_source_identity behavior.

Sequence Diagram(s)

sequenceDiagram
  participant ModelLoader
  participant MXCheckpointLoader
  participant MxClient
  participant HfCheckpointLoader
  participant Model

  ModelLoader->>MXCheckpointLoader: load_weights(allow_post_transform_weights=True, source_identity=...)
  MXCheckpointLoader->>MxClient: fetch source metadata
  MxClient-->>MXCheckpointLoader: metadata (layout, protocol_version, serialized identity)
  MXCheckpointLoader->>MXCheckpointLoader: _source_metadata_identity_compatible()
  alt metadata compatible + post_transform layout + allowed
    MXCheckpointLoader->>MxClient: RDMA P2P weight transfer
    MXCheckpointLoader-->>ModelLoader: _post_transform_weights_preloaded=True
    ModelLoader->>Model: _setup_aliases()
    ModelLoader->>Model: _mark_weights_transformed()
    ModelLoader->>Model: _walk_cache_state()
  else incompatible or not allowed
    MXCheckpointLoader->>HfCheckpointLoader: load_weights (disk fallback)
    MXCheckpointLoader-->>ModelLoader: _post_transform_weights_preloaded=False
    ModelLoader->>Model: _walk_full_post_load()
  end
  ModelLoader->>MXCheckpointLoader: post_load_publish(source_identity=...)
  MXCheckpointLoader->>MxClient: publish_model_params(metadata={layout, protocol, identity})
Loading
sequenceDiagram
  participant ModelLoader
  participant GMSBackend
  participant Model

  Note over ModelLoader,Model: GMS Read-Only Path (new ordering)
  ModelLoader->>Model: _setup_aliases()
  ModelLoader->>ModelLoader: _check_gms_source_identity()
  ModelLoader->>GMSBackend: materialize_module(model)
  ModelLoader->>Model: _walk_cache_state() [cache_derived_state per module]
  ModelLoader->>ModelLoader: _post_load_publish(...)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

api-breaking

Suggested reviewers

  • pcastonguay
  • brb-nv
  • galletas1712
  • Funatiq
  • symphonylyh
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: enabling MX post-transform Llama receiver capability as part of Wave 5 of a staged rollout.
Description check ✅ Passed The PR description comprehensively covers the implementation details, validation, and dependency stack with clear technical explanations and test coverage information.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
tensorrt_llm/_torch/models/modeling_qwen3_moe.py (1)

420-427: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add an explicit None return annotation to the renamed hook.

setup_aliases is a state-mutating hook and does not return a value.

Proposed fix
-    def setup_aliases(self):
+    def setup_aliases(self) -> None:

As per coding guidelines, “Always annotate functions. Make the return type None if the function does not return anything.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/modeling_qwen3_moe.py` around lines 420 - 427, The
setup_aliases method is missing an explicit return type annotation. Add `->
None` to the method signature of setup_aliases to indicate that this
state-mutating hook does not return a value, as per coding guidelines requiring
all functions to have return type annotations.

Source: Coding guidelines

tensorrt_llm/_torch/models/modeling_gpt_oss.py (1)

634-645: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add an explicit None return annotation to the renamed hook.

setup_aliases mutates alias fields and returns nothing; annotating it keeps the new lifecycle hook contract clear.

Proposed fix
-    def setup_aliases(self):
+    def setup_aliases(self) -> None:

As per coding guidelines, “Always annotate functions. Make the return type None if the function does not return anything.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/modeling_gpt_oss.py` around lines 634 - 645, The
setup_aliases method lacks an explicit return type annotation. Add `-> None` to
the method signature after the closing parenthesis in the setup_aliases method
definition to indicate that this method mutates state but does not return any
value, keeping the function contract clear and consistent with coding guidelines
that require all functions to be annotated with their return types.

Source: Coding guidelines

tensorrt_llm/_torch/models/modeling_qwen3_next.py (1)

983-990: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add an explicit None return annotation to the renamed hook.

setup_aliases only wires aliases, so the new lifecycle method should be annotated as returning None.

Proposed fix
-    def setup_aliases(self):
+    def setup_aliases(self) -> None:

As per coding guidelines, “Always annotate functions. Make the return type None if the function does not return anything.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/modeling_qwen3_next.py` around lines 983 - 990,
The setup_aliases method in the Qwen3 model class lacks an explicit return type
annotation. Add a `-> None` return type annotation to the method signature of
setup_aliases to explicitly indicate that this method does not return any value,
as per the coding guidelines requiring all functions to have return type
annotations.

Source: Coding guidelines

tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py (1)

201-212: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add return annotations to the new MX lifecycle methods.

These methods expose boolean/side-effect lifecycle contracts, so annotate them explicitly.

Proposed fix
-    def is_post_transform_weights_preloaded(self) -> bool:
+    def is_post_transform_weights_preloaded(self) -> bool:
         """Whether the last successful MX preload delivered transformed bytes.
@@
-    ) -> None:
+    ) -> None:
         """Publish this instance's weights so other ranks can pull via P2P.
@@
-    ) -> None:
+    ) -> None:
         """Publish locally loaded weights as an MX source when appropriate.

As per coding guidelines, “Always annotate functions. Make the return type None if the function does not return anything.”

Also applies to: 561-567, 664-670

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py` around lines
201 - 212, Add explicit return type annotations to all three MX lifecycle
methods to comply with coding guidelines. The method
is_post_transform_weights_preloaded already shows the required -> bool
annotation in the diff, but ensure the other two methods at lines 561-567 and
664-670 also have their appropriate return type annotations added (these likely
also return bool based on the lifecycle contract pattern). Verify each method
has its return type explicitly annotated rather than relying on implicit type
inference.

Source: Coding guidelines

tensorrt_llm/_torch/modules/fused_moe/interface.py (1)

830-841: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add explicit -> None annotations to the new lifecycle hooks.

These hooks do not return values, so annotate them explicitly.

As per coding guidelines, “Always annotate functions. Make the return type None if the function does not return anything.”

Proposed fix
-    def transform_weights(self):
+    def transform_weights(self) -> None:
         if getattr(self, "_weights_transformed", False):
             return
         self.quant_method.transform_weights(self)
         self._weights_transformed = True
 
-    def cache_derived_state(self):
+    def cache_derived_state(self) -> None:
         self.quant_method.cache_derived_state(self)
 
-    def post_load_weights(self):
+    def post_load_weights(self) -> None:
         self.transform_weights()
         self.cache_derived_state()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/modules/fused_moe/interface.py` around lines 830 - 841,
The methods transform_weights, cache_derived_state, and post_load_weights are
missing explicit return type annotations. Add `-> None` to the method signature
of each of these three methods since they do not return any values. This follows
the coding guideline that all functions must be annotated with their return
type, using `None` when the function does not return anything.

Source: Coding guidelines

tensorrt_llm/_torch/modules/fused_moe/quantization.py (1)

562-580: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Annotate the changed lifecycle hook signatures.

The new/renamed hooks should explicitly return None; Line 1015 should also type module consistently with the other hooks.

As per coding guidelines, “Always annotate functions. Make the return type None if the function does not return anything.”

Proposed signature updates
-    def transform_weights(self, module: torch.nn.Module):
+    def transform_weights(self, module: torch.nn.Module) -> None:

-    def cache_derived_state(self, module: torch.nn.Module):
+    def cache_derived_state(self, module: torch.nn.Module) -> None:

-    def post_load_weights(self, module: torch.nn.Module):
+    def post_load_weights(self, module: torch.nn.Module) -> None:

-    def transform_weights(self, module):
+    def transform_weights(self, module: torch.nn.Module) -> None:

Also applies to: 784-787, 1015-1016, 1280-1300, 3106-3111, 5351-5352

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/modules/fused_moe/quantization.py` around lines 562 -
580, Add explicit return type annotations to the lifecycle hook methods
transform_weights, cache_derived_state, and post_load_weights by appending ->
None to their signatures. Additionally, ensure that the module parameter is
consistently typed as torch.nn.Module across all these hook methods and at the
other locations mentioned (784-787, 1015-1016, 1280-1300, 3106-3111, 5351-5352).
This follows the coding guideline that all functions must be annotated with
their return types, using None when the function does not return a value.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`:
- Around line 655-667: The _weights_transformed guard flag in
transform_weights() prevents weights from being transformed on subsequent load
cycles because the flag is never reset. When load_weights() is called to load
fresh weights, reset the _weights_transformed flag to False so that the guard
check in transform_weights() will allow the newly loaded weights to be
transformed in the subsequent post_load_weights() call.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py`:
- Line 1: The file fused_moe_cutlass.py is flagged as executable but contains no
shebang line, which violates the EXE002 lint rule. Since this is library code
and not intended to be run as a standalone script, remove the executable bit
from the file to resolve the violation. This can be done by changing the file
permissions to make it non-executable using your operating system's file
permission tools.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py`:
- Line 1: The file fused_moe_wide_ep.py has the executable bit set but contains
no shebang line, triggering Ruff's EXE002 rule. Since this is a library module
and not intended to be directly executed, remove the executable permission from
the file using the appropriate file permission command (such as chmod -x on
Unix-like systems) rather than adding a shebang.

In `@tensorrt_llm/_torch/modules/linear.py`:
- Around line 383-384: The transform_weights method in the Linear class
currently uses a pass statement which triggers a Ruff B027 lint violation.
Replace the pass statement with an explicit return None to maintain the default
no-op behavior while satisfying the linting requirements. This keeps the
optional hook functional while following the tooling's style guidelines.

In `@tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py`:
- Around line 491-500: In the test assertion section where the metadata is
validated, after the existing assert statement that checks
`_MX_SOURCE_IDENTITY_METADATA_KEY in metadata`, add another assertion to
validate the actual serialized value stored at that metadata key. The assertion
should compare the value of `metadata[_MX_SOURCE_IDENTITY_METADATA_KEY]` against
the expected serialized representation of the source_identity object used in
this test, ensuring the identity payload is not just present but also correct.
- Around line 429-461: The
test_post_transform_mixed_success_falls_back_to_full_disk_load test only
validates the final fallback behavior but does not explicitly verify that
MxLiveWeightLoader.load_weights was actually invoked during the P2P attempt. Add
a mock/patch for MxLiveWeightLoader.load_weights within the context manager
alongside the existing HfCheckpointLoader patch, then add an assertion after the
result checks to verify this method was called once, ensuring the test validates
the intended "attempt P2P then fallback to disk" behavior rather than just the
final outcome.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py`:
- Around line 201-212: Add explicit return type annotations to all three MX
lifecycle methods to comply with coding guidelines. The method
is_post_transform_weights_preloaded already shows the required -> bool
annotation in the diff, but ensure the other two methods at lines 561-567 and
664-670 also have their appropriate return type annotations added (these likely
also return bool based on the lifecycle contract pattern). Verify each method
has its return type explicitly annotated rather than relying on implicit type
inference.

In `@tensorrt_llm/_torch/models/modeling_gpt_oss.py`:
- Around line 634-645: The setup_aliases method lacks an explicit return type
annotation. Add `-> None` to the method signature after the closing parenthesis
in the setup_aliases method definition to indicate that this method mutates
state but does not return any value, keeping the function contract clear and
consistent with coding guidelines that require all functions to be annotated
with their return types.

In `@tensorrt_llm/_torch/models/modeling_qwen3_moe.py`:
- Around line 420-427: The setup_aliases method is missing an explicit return
type annotation. Add `-> None` to the method signature of setup_aliases to
indicate that this state-mutating hook does not return a value, as per coding
guidelines requiring all functions to have return type annotations.

In `@tensorrt_llm/_torch/models/modeling_qwen3_next.py`:
- Around line 983-990: The setup_aliases method in the Qwen3 model class lacks
an explicit return type annotation. Add a `-> None` return type annotation to
the method signature of setup_aliases to explicitly indicate that this method
does not return any value, as per the coding guidelines requiring all functions
to have return type annotations.

In `@tensorrt_llm/_torch/modules/fused_moe/interface.py`:
- Around line 830-841: The methods transform_weights, cache_derived_state, and
post_load_weights are missing explicit return type annotations. Add `-> None` to
the method signature of each of these three methods since they do not return any
values. This follows the coding guideline that all functions must be annotated
with their return type, using `None` when the function does not return anything.

In `@tensorrt_llm/_torch/modules/fused_moe/quantization.py`:
- Around line 562-580: Add explicit return type annotations to the lifecycle
hook methods transform_weights, cache_derived_state, and post_load_weights by
appending -> None to their signatures. Additionally, ensure that the module
parameter is consistently typed as torch.nn.Module across all these hook methods
and at the other locations mentioned (784-787, 1015-1016, 1280-1300, 3106-3111,
5351-5352). This follows the coding guideline that all functions must be
annotated with their return types, using None when the function does not return
a value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1c1244f5-435d-4fc9-8b2e-865bc2265648

📥 Commits

Reviewing files that changed from the base of the PR and between 4a8b7af and 14a4537.

📒 Files selected for processing (33)
  • tensorrt_llm/_torch/attention_backend/sparse/dsa.py
  • tensorrt_llm/_torch/memory/gpu_memory_backend.py
  • tensorrt_llm/_torch/models/checkpoints/base_checkpoint_loader.py
  • tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
  • tensorrt_llm/_torch/models/modeling_deepseekv3.py
  • tensorrt_llm/_torch/models/modeling_exaone_moe.py
  • tensorrt_llm/_torch/models/modeling_glm.py
  • tensorrt_llm/_torch/models/modeling_gpt_oss.py
  • tensorrt_llm/_torch/models/modeling_llama.py
  • tensorrt_llm/_torch/models/modeling_llama_min_latency.py
  • tensorrt_llm/_torch/models/modeling_qwen3_moe.py
  • tensorrt_llm/_torch/models/modeling_qwen3_next.py
  • tensorrt_llm/_torch/modules/attention.py
  • tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_cute_dsl_b12x.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_densegemm.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
  • tensorrt_llm/_torch/modules/fused_moe/interface.py
  • tensorrt_llm/_torch/modules/fused_moe/mega_moe/mega_moe_deepgemm.py
  • tensorrt_llm/_torch/modules/fused_moe/quantization.py
  • tensorrt_llm/_torch/modules/linear.py
  • tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py
  • tensorrt_llm/_torch/pyexecutor/model_loader.py
  • tests/unittest/_torch/attention/sparse/test_dsa_indexer.py
  • tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py
  • tests/unittest/_torch/modules/mamba/test_mamba2_mixer.py
  • tests/unittest/_torch/modules/moe/test_moe_backend.py
  • tests/unittest/_torch/pyexecutor/test_model_loader_gms.py
  • tests/unittest/_torch/pyexecutor/test_model_loader_mx.py
  • tests/unittest/_torch/weight_sharing/test_mx_source_identity_gate.py

Comment thread tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py Outdated
Comment thread tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py
Comment thread tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
Comment thread tensorrt_llm/_torch/modules/linear.py Outdated
@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from 14a4537 to 5599299 Compare June 23, 2026 05:10

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55162 [ run ] triggered by Bot. Commit: 5599299 Link to invocation

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55179 [ run ] triggered by Bot. Commit: 56c84de Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55162 [ run ] completed with state ABORTED. Commit: 5599299

Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55179 [ run ] completed with state SUCCESS. Commit: 56c84de
/LLM/main/L0_MergeRequest_PR pipeline #44147 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch 2 times, most recently from 28e8066 to a2a79e7 Compare June 23, 2026 17:02

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55285 [ run ] triggered by Bot. Commit: a2a79e7 Link to invocation

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from 9088c0b to 77eddfa Compare June 23, 2026 17:40

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55305 [ run ] triggered by Bot. Commit: 77eddfa Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55285 [ run ] completed with state ABORTED. Commit: a2a79e7

Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55305 [ run ] completed with state SUCCESS. Commit: 77eddfa
/LLM/main/L0_MergeRequest_PR pipeline #44255 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants